Advanced data visualization with R and ggplot2



This practical follows the previous basic introduction to ggplot2. It allows to go further with ggplot2: annotation, theme customization, color palette, output formats, scales, and more.

Get ready


The following libraries are needed all along the practical. Install them with install.packages() if you do not have them already. Then load them with library().

# Load it
library(ggplot2)
library(dplyr)
library(hrbrthemes)
library(viridis)
library(plotly)

1- General appearance


→ Titles


Q1.1 The code below builds a basic histogram for Rbnb apartment prices on the French Riviera. It shows only value under 300 euros. Add code to:

  • add a title with ggtitle()
  • change axis labels xlab() and ylab()
  • change axis limits with xlim() and ylim()
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram
data %>%
  filter( price<300 ) %>%
  ggplot( aes(x=price)) +
    geom_histogram() +
    ...

→ Chart components

All ggplot2 chart components can be changed using the theme() function. You can see a complete list of components in the official documentation.

Note: components are changed using different functions: element_text(), element_line() for lines and so on..




Q1.2 - Reproduce the previous histogram and change:

  • plot title size and color with plot.title
  • X axis title size and color with axis.title.x
  • Grid appearance with panel.grid.major
# Make the histogram
data %>%
  filter( price<300 ) %>%
  ggplot( aes(x=price)) +
    ... +
    theme(
      plot.title = element_text(size=..., color=...),
      ...,
      ...
    )

→ Themes


Q1.3 ggplot2 offers a set of pre-built themes. Try the followings to see which one you like the most:

  • theme_bw()
  • theme_dark()
  • theme_minimal()
  • theme_classic()

See a complete list here.

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram
data %>%
  ... +
    theme_classic()




Q1.4 - The hrbrthemes package provides my favourite style. Install the package, load it, and apply the theme_ipsum(). Documentation is here.

# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram
data %>%
  filter( price<300 ) %>%
  ggplot( aes(x=price)) +
    stat_bin(breaks=seq(0,300,10), fill="#69b3a2", color="#e9ecef", alpha=0.9) +
    ggtitle("Night price distribution of Airbnb appartements") +
    theme_ipsum()

2- Annotation


Annotation is a crucial component of a good dataviz. It can turn a boring graphic into an interesting and insightful way to convey information. Dataviz is often separated in two main types: exploratory and explanatory analysis. Annotation is used for the second type.

→ Text

The most common type of annotation is text. Let’s say you have a spike in a line plot. It totally makes sense to highlight it, and explain more in details what it is about.




Q1.1 - Build a line plot showing the bitcoin price evolution between 2013 and 2018. Dataset is located here and can be read directly with read.table(). What part of the chart would you highlight?

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T)
data$date <- as.Date(data$date)

# plot
data %>%
  ggplot( aes(x=date, y=value)) +
    geom_line(color="#69b3a2")




Q1.2 - Use the annotate() function to add text. Annotate requires several arguments:

  • geom: type of annotation, use text
  • x: position on the X axis
  • y: position on the Y axis
  • label: what you want to write
  • Optional: color, size, angle and more.
# plot
data %>%
  ggplot( aes(x=date, y=value)) +
    geom_line(color="#69b3a2") +
    annotate(geom="text", x=as.Date("2017-01-01"), y=19000, 
             label="Bitcoin price reached 20k $\nat the end of 2017")

→ Shape




Q1.3 - To highlight the spike even more, draw a circle around it. Note that you first need to find the exact spike date and its value

# Find spike date and value:
# data %>% arrange(desc(value)) %>% head(1)

# plot
data %>% 
  ggplot( aes(x=date, y=value)) +
    geom_line(color="#69b3a2") +
    ylim(0,22000) +
    annotate(geom="text", x=as.Date("2017-01-01"), y=20089, 
             label="Bitcoin price reached 20k $\nat the end of 2017") +
    annotate(geom="point", x=as.Date("2017-12-17"), y=20089, size=10, shape=21, fill="transparent")

→ Abline




Q1.4 - Add a horizontal abline to show what part of the curve is over 5000 $. This is possible thanks to the geom_hline() function that requires its yintercept argument.

# Find spike date and value:
# data %>% arrange(desc(value)) %>% head(1)

# plot
data %>% 
  ggplot( aes(x=date, y=value)) +
    geom_line(color="#69b3a2") +
    ylim(0,22000) +
    annotate(geom="text", x=as.Date("2017-01-01"), y=20089, 
             label="Bitcoin price reached 20k $\nat the end of 2017") +
    annotate(geom="point", x=as.Date("2017-12-17"), y=20089, size=10, shape=21, fill="transparent") +
    geom_hline(yintercept=5000, color="orange", size=.5)

→ Color




Q1.5 - Build a scatterplot based on the gapminder dataset. Use gdpPercap for the X axis, lifeExp for the Y axis, and pop for bubble size. Keep only the year 2007.

# Data are available in the gapminder package
library(gapminder)
data <- gapminder %>% filter(year=="2007") %>% select(-year)

# Basic scatterplot
ggplot( data, aes(x=gdpPercap, y=lifeExp, size = pop, color = continent)) +
    geom_point(alpha=0.7) 




Q1.6 - Highlight South Africa in the chart: draw it in red, with all other circles in grey. Follow those steps:

  • create a new column with mutate: this new column has the value yes if country=="South Africa", no otherwise. This is possible thanks to the ??felse function.
  • in the aesthetics part of the ggplot call, use this new column to control dot colors
  • use
# Basic scatterplot
data %>%
  mutate(isSouthAfrica = ifelse(country=="South Africa", "yes", "no")) %>%
  ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = isSouthAfrica)) +
    geom_point(alpha=0.7) +
    scale_color_manual(values=c("grey", "red")) +
    theme(legend.position="none")

→ Multiple text




Q1.7 - Highlight every country with gdpPercap > 5000 & lifeExp < 60 in red. Write their names using the geom_text_repel of the ggrepel package

# ggrepel
library(ggrepel)

# prepare data
tmp <- data %>%
  mutate( annotation = ifelse(gdpPercap > 5000 & lifeExp < 60, "yes", "no"))

# plot
tmp %>%
  ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = continent)) +
    geom_point(alpha=0.7) +
    theme(legend.position="none") +
    geom_text_repel(data=tmp %>% filter(annotation=="yes"), aes(label=country), size=4 )

→ Bonus

3- Faceting


→ facet_wrap()

# Libraries
library(babynames)

# Load dataset from github
data <- babynames %>% 
  filter(name %in% c("Ashley", "Amanda", "Jessica",    "Patricia", "Linda", "Deborah",   "Dorothy", "Betty", "Helen")) %>%
  filter(sex=="F")

# line plot = spaghetti chart
data %>%
  ggplot( aes(x=year, y=n, group=name, color=name)) +
    geom_line() +
    ggtitle("Popularity of American names in the previous 30 years")

data %>%
  ggplot( aes(x=year, y=n, group=name, fill=name)) +
    geom_area() +
    ggtitle("Popularity of American names in the previous 30 years") +
    theme(
      legend.position="none",
    ) +
    facet_wrap(~name, scale="free_y")

→ facet_grid()

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/10_OneNumSevCatSubgroupsSevObs.csv", header=T, sep=",")

# Plot
ggplot(data, aes(x=total_bill)) +
  geom_histogram() +
  facet_grid(sex~day)

4- Saving plots





Q4.1 - Save the previous chart as a PNG file using the ggsave() function. Where is saved the file?

# save the plot in an object called p
p <- ggplot(data, aes(x=total_bill)) +
  geom_histogram() +
  facet_grid(sex~day)

# Save the plot
ggsave(p, filename = "chartFromRPractical.png")




Q4.2 - Specify the complete path before file name to save the chart at a specific location.

5- Colors


→ One color

→ Discrete color palette




Q5.1 - Build a scatterplot based on the iris dataset. Use Sepal.Length for the X axis, Petal.Length for the Y axis. Use color=Species to color groups.

ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
  geom_point()

ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
  geom_point() +
  scale_color_manual( values=c("red","green","blue"))




Q5.1 - Build a scatterplot based on the iris dataset. Use Sepal.Length for the X axis, Petal.Length for the Y axis. Use color=Species to color groups.

ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
  geom_point() +
  scale_color_brewer(palette = "Set3")

→ Continuous color palette




Q5.1 - Build a scatterplot based on the iris dataset. Use Sepal.Length for the X axis, Petal.Length for the Y axis. Use color=Species to color groups.

ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Sepal.Length)) +
  geom_point() +
  scale_color_distiller(palette = "RdPu")

6- Interactive charts


An interactive chart is a chart on which you can zoom, hover shapes to get tooltips, click to trigger actions and more. Building interactive charts requires javascript under the hood, but it is relatively easy to build it using R packages that wrap the javascript for you. This type of packages are called HTML widgets.

→ Plotly




Q6.1 - Build the gapminder bubble plot you’ve already done in the annotation part of this practical. Store it in an object called p

# load data
library(gapminder)
data <- gapminder %>% filter(year=="2007") %>% select(-year)

# Basic ggplot
p <- data %>%
  ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = continent)) +
    geom_point(alpha=0.7) 
p




Q6.2 - Install and load the plotly package. Build an interactive chart using the ggplotly() function. What are the new functionalities of this chart? Is it useful? What could be better?

# Interactive version
library(plotly)
ggplotly(p)




Q6.3 - Let’s improve the tooltip of the chart:

  • build a new column called myText. Fill it with whatever you want to show in the tooltip.
  • add a new aesthetics: text=myText
  • in the ggplotly() call, add tooltip="text"
# Basic ggplot
p <- data %>%
  mutate(myText=paste("This country is: " , country )) %>%
  ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = continent, text=myText)) +
    geom_point(alpha=0.7) 

ggplotly(p, tooltip="text")

→ HTML widgets




Q6.4 - Plotly is not the only html widget. Visit this website to have an overview of what kind of interactive chart you can do with R.




BONUS - Use the HTML widget called dygraphs to build an interactive line plot of the bitcoin price. Try to reproduce the example below.

# Library
library(dygraphs)
library(xts)          # To make the convertion data-frame / xts format

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T)
data$date <- as.Date(data$date)

# Then you can create the xts format, and thus use dygraph
don <- xts(x = data$value, order.by = data$date)

# Use the dygraph HTML widget
dygraph(don) %>%
  dyOptions(labelsUTC = TRUE, fillGraph=TRUE, fillAlpha=0.1, drawGrid = FALSE, colors="#D8AE5A") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = FALSE)  %>%
  dyRoller(rollPeriod = 1)




BONUS - Use the HTML widget called leaflet to build an interactive map showing the earthquakes described in the dataset called quakes

# Library
library(leaflet)
 
# load example data (Fiji Earthquakes) + keep only 100 first lines
data(quakes)
quakes =  head(quakes, 100)

# Create a color palette with handmade bins.
mybins=seq(4, 6.5, by=0.5)
mypalette = colorBin( palette="YlOrBr", domain=quakes$mag, na.color="transparent", bins=mybins)

# Final Map
leaflet(quakes) %>% 
  addTiles()  %>% 
  setView( lat=-27, lng=170 , zoom=4) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addCircleMarkers(~long, ~lat, 
    fillColor = ~mypalette(mag), fillOpacity = 0.7, color="white", radius=8, stroke=FALSE
  ) %>%
  addLegend( pal=mypalette, values=~mag, opacity=0.9, title = "Magnitude", position = "bottomright" )
 




A work by a practical by Yan Holtz

yan.holtz.data@gmail.com